CS224N: Investigating SMS Text Normalization using Statistical Machine Translation

نویسندگان

  • Karthik Raghunathan
  • Stefan Krawczyk
چکیده

In this project we explore two approaches to SMS text normalization. First we try a dictionary substitution approach used by most websites that provide such a service, and then modify it with our extension. This is followed by a statistical machine translation (MT) approach using off the shelf MT tools. We evaluate the performance of our system on three test sets from different sources and discuss the shortcomings of our system and results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Phrase-Based Statistical Model for SMS Text Normalization

Short Messaging Service (SMS) texts behave quite differently from normal written texts and have some very special phenomena. To translate SMS texts, traditional approaches model such irregularities directly in Machine Translation (MT). However, such approaches suffer from customization problem as tremendous effort is required to adapt the language model of the existing translation system to han...

متن کامل

A Framework for Translating SMS Messages

Short Messaging Service (SMS) has become a popular form of communication. While it is predominantly used for monolingual communication, it can be extremely useful for facilitating cross-lingual communication through statistical machine translation. In this work we present an application of statistical machine translation to SMS messages. We decouple the SMS translation task into normalization f...

متن کامل

A REVIEW PAPER ON SMS TEXT TO PLAIN ENGLISH TRANSLATION(Text Normalization)

Mobile technology as well as social networking technology plays an important role in communication across internet. A large amount of information is found in noisy contexts as texting and chat lingo have become increasingly considerably in the past decade. This noisy information needs to be normalized into the standard text so that it can be used by the various other tools such as text-to-speec...

متن کامل

Rewriting the orthography of SMS messages

Electronic written texts used in computer-mediated interactions (emails, blogs, chats, and the like) contain significant deviations from the norm of the language. This paper presents the detail of a system aiming at normalizing the orthography of French SMS messages: after discussing the linguistic peculiarities of these messages and possible approaches to their automatic normalization, we pres...

متن کامل

CMU Haitian Creole-English Translation System for WMT 2011

This paper describes the statistical machine translation system submitted to the WMT11 Featured Translation Task, which involves translating Haitian Creole SMS messages into English. In our experiments we try to address the issue of noise in the training data, as well as the lack of parallel training data. Spelling normalization is applied to reduce out-of-vocabulary words in the corpus. Using ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009